51 research outputs found

    Fewest repetitions in infinite binary words

    Get PDF
    A square is the concatenation of a nonempty word with itself. A word has period p if its letters at distance p match. The exponent of a nonempty word is the quotient of its length over its smallest period. In this article we give a proof of the fact that there exists an infinite binary word which contains finitely many squares and simultaneously avoids words of exponent larger than 7/3. Our infinite word contains 12 squares, which is the smallest possible number of squares to get the property, and 2 factors of exponent 7/3. These are the only factors of exponent larger than 2. The value 7/3 introduces what we call the finite-repetition threshold of the binary alphabet. We conjecture it is 7/4 for the ternary alphabet, like its repetitive threshold

    Avoiding conjugacy classes on the 5-letter alphabet

    Get PDF
    We construct an infinite word w over the 5-letter alphabet such that for every factor f of w of length at least two, there exists a cyclic permutation of f that is not a factor of w. In other words, w does not contain a non-trivial conjugacy class. This proves the conjecture in Gamard et al. [TCS 2018

    Characterization of some binary words with few squares

    Get PDF
    Thue proved that the factors occurring infinitely many times in square-free words over {0,1,2} avoiding the factors in {010,212} are the factors of the fixed point of the morphism 0 → 012, 1 → 02, 2 → 1. He similarly characterized square-free words avoiding {010,020} and {121,212} as the factors of two morphic words. In this paper, we exhibit smaller morphisms to define these two square-free morphic words and we give such characterizations for six types of binary words containing few distinct squares

    Infinite binary words containing repetitions of odd period

    Get PDF
    A square is the concatenation of a nonempty word with itself. A word has period p if its letters at distance p match. The exponent of a nonempty word is its length divided by its smallest period. In this article, we give some new results on the trade-off between the number of squares and the number of cubes in infinite binary words whose square factors have odd periods

    Finite-Repetition threshold for infinite ternary words

    Get PDF
    The exponent of a word is the ratio of its length over its smallest period. The repetitive threshold r(a) of an a-letter alphabet is the smallest rational number for which there exists an infinite word whose finite factors have exponent at most r(a). This notion was introduced in 1972 by Dejean who gave the exact values of r(a) for every alphabet size a as it has been eventually proved in 2009. The finite-repetition threshold for an a-letter alphabet refines the above notion. It is the smallest rational number FRt(a) for which there exists an infinite word whose finite factors have exponent at most FRt(a) and that contains a finite number of factors with exponent r(a). It is known from Shallit (2008) that FRt(2)=7/3. With each finite-repetition threshold is associated the smallest number of r(a)-exponent factors that can be found in the corresponding infinite word. It has been proved by Badkobeh and Crochemore (2010) that this number is 12 for infinite binary words whose maximal exponent is 7/3. We show that FRt(3)=r(3)=7/4 and that the bound is achieved with an infinite word containing only two 7/4-exponent words, the smallest number. Based on deep experiments we conjecture that FRt(4)=r(4)=7/5. The question remains open for alphabets with more than four letters. Keywords: combinatorics on words, repetition, repeat, word powers, word exponent, repetition threshold, pattern avoidability, word morphisms.Comment: In Proceedings WORDS 2011, arXiv:1108.341

    Binary Jumbled String Matching for Highly Run-Length Compressible Texts

    Full text link
    The Binary Jumbled String Matching problem is defined as: Given a string ss over {a,b}\{a,b\} of length nn and a query (x,y)(x,y), with x,yx,y non-negative integers, decide whether ss has a substring tt with exactly xx aa's and yy bb's. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time O(n2/logn)O(n^2/\log n) [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or O(n2/log2n)O(n^2/\log^2 n) in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of ss. The construction time of our index is O(n+ρ2logρ)O(n+\rho^2\log \rho), where O(n) is the time for computing the run-length encoding of ss and ρ\rho is the length of this encoding---this is no worse than previous solutions if ρ=O(n/logn)\rho = O(n/\log n) and better if ρ=o(n/logn)\rho = o(n/\log n). Our index LL can be queried in O(logρ)O(\log \rho) time. While L=O(min(n,ρ2))|L|= O(\min(n, \rho^{2})) in the worst case, preliminary investigations have indicated that L|L| may often be close to ρ\rho. Furthermore, the algorithm for constructing the index is conceptually simple and easy to implement. In an attempt to shed light on the structure and size of our index, we characterize it in terms of the prefix normal forms of ss introduced in [Fici and Lipt\'ak, DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures on size of Corner Index (we no longer conjecture it to be always linear in size of RLE); removed experimental part on random strings (these are valid but limited in their predictive power w.r.t. general strings); v3 published in IP

    Left Lyndon tree construction

    Get PDF
    We extend the left-to-right Lyndon factorisation of a word to the left Lyn-don tree construction of a Lyndon word. It yields an algorithm to sort the prefixes of a Lyndon word according to the infinite ordering defined by Dolce et al. (2019). A straightforward variant computes the left Lyndon forest of a word. All algorithms run in linear time on a general alphabet (letter-comparison model)

    Corpus-building and corpus-based musicology for the Early Modern Period: Towards a complete Electronic Corpus of Lute Music... and beyond

    Get PDF
    Sustainable musicology must make use of as wide a set of contributors, and hear as many voices, as possible. As the online Electronic Corpus of Lute Music approaches its 20th year, a change of approach – embracing enthusiast and scholarly collections alike – is increasing the size of the encoded corpus tenfold and could allow us to provide metadata on almost all of the over 60,000 items in the known lute repertory. This approach brings challenges and limitations, as well as opportunities for scholarship beyond what has previously been possible. The new sub-corpora have diverse editorial strategies and metadata quality, sometimes lacking basic information such as instrumental tuning. On the other hand, a combination of resources to give even 15-20% of the known repertory, combined with metadata to evaluate biases in that sample, could prove invaluable for corpus studies, and also help discover hitherto unrecognised connections and quotations between works. As many vocal pieces of the period are now available online in digital facsimile, the lute corpus also presents a tantalising key for exploring the wider repertory of the period. Through Optical Music Recognition, we are gathering an expanding corpus of >500,000 pages transcribed from early-modern sources. Again, the nature of the material and how it has been gathered places limitations on the uses that can be made of it. Nonetheless, appropriate pattern discovery methods can support search and certain kinds of analysis. Large-scale, cross-corpus analysis between vocal and instrumental works presents a particularly exciting opportunity, but requires adaptations to existing approaches. Lute tablature makes no distinction between the voices of a composition, making many conventional melodic features unavailable without further processing. Building and using these corpora requires new approaches to computational musicology – not just algorithmic approaches, but also social and organisational – to ensure a strong future for corpus-based research
    corecore